17 research outputs found

    Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search

    Get PDF
    Sequence-to-sequence deep neural networks have become the state of the art for a variety of machine learning applications, ranging from neural machine translation (NMT) to speech recognition. Many mobile and Internet of Things (IoT) applications would benefit from the ability of performing sequence-to-sequence inference directly in embedded devices, thereby reducing the amount of raw data transmitted to the cloud, and obtaining benefits in terms of response latency, energy consumption and security. However, due to the high computational complexity of these models, specific optimization techniques are needed to achieve acceptable performance and energy consumption on single-core embedded processors. In this paper, we present a new optimization technique called dynamic beam search, in which the inference complexity is tuned to the difficulty of the processed input sequence at runtime. Results based on measurements on a real embedded device, and on three state-of-the-art deep learning models, show that our method is able to reduce the inference time and energy by up to 25% without loss of accuracy

    Two-stage Human Activity Recognition on Microcontrollers with Decision Trees and CNNs

    Get PDF
    Human Activity Recognition (HAR) has become an increasingly popular task for embedded devices such as smartwatches. Most HAR systems for ultra-low power devices are based on classic Machine Learning (ML) models, whereas Deep Learning (DL), although reaching state-of-the-art accuracy, is less popular due to its high energy consumption, which poses a significant challenge for battery-operated and resource-constrained devices. In this work, we bridge the gap between on-device HAR and DL thanks to a hierarchical architecture composed of a decision tree (DT) and a one dimensional Convolutional Neural Network (ID CNN). The two classifiers operate in a cascaded fashion on two different sub-tasks: the DT classifies only the easiest activities, while the CNN deals with more complex ones. With experiments on a state-of-the-art dataset and targeting a single-core RISC-V MCU, we show that this approach allows to save up to 67.7% energy w.r.t. a 'stand-alone' DL architecture at iso-accuracy. Additionally, the two-stage system either introduces a negligible memory overhead (up to 200 B) or on the contrary, reduces the total memory occupation

    Ambipolar suppression of superconductivity by ionic gating in optimally-doped BaFe2(As,P)2 ultrathin films

    Get PDF
    Superconductivity (SC) in the Ba-122 family of iron-based compounds can be controlled by aliovalent or isovalent substitutions, applied external pressure, and strain, the combined effects of which are sometimes studied within the same sample. Most often, the result is limited to a shift of the SC dome to different doping values. In a few cases, the maximum SC transition at optimal doping can also be enhanced. In this work, we study the combination of charge doping together with isovalent P substitution and strain by performing ionic gating experiments on BaFe2_2(As0.8_{0.8}P0.2_{0.2})2_2 ultrathin films. We show that the polarization of the ionic gate induces modulations to the normal-state transport properties that can be mainly ascribed to surface charge doping. We demonstrate that ionic gating can only shift the system away from the optimal conditions, as the SC transition temperature is suppressed by both electron and hole doping. We also observe a broadening of the resistive transition, which suggests that the SC order parameter is modulated nonhomogeneously across the film thickness, in contrast with earlier reports on charge-doped standard BCS superconductors and cuprates.Comment: 10 pages, 5 figure

    Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays

    Get PDF
    Ultra-low-resolution Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting, with applications such as occupancy monitoring. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. However, the literature was missing an extensive comparative analysis of various efficient DL architectures for IR array-based people counting, that considers not only their accuracy, but also the cost of deploying them on memory- and energy-constrained Internet of Things (IoT) edge nodes. In this work, we address this need by comparing 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array, which we made openly available. With a wide architectural exploration of each model type, we obtain a rich set of Pareto-optimal solutions, spanning cross-validated balanced accuracy scores in the 55.70-82.70% range. When deployed on a commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference, while consuming 17.18-120.43 ÎĽ\muJ of energy. Our models are significantly more accurate than a previous deterministic method (up to +39.9%), while being up to 3.53x faster and more energy efficient. Further, our models' accuracy is comparable to state-of-the-art DL solutions on similar resolution sensors, despite a much lower complexity. All our models enable continuous, real-time inference on a MCU-based IoT node, with years of autonomous operation without battery recharging.Comment: This article has been accepted for publication in IEEE Internet of Things Journa

    Phonon dispersion and lifetimes in MgB2

    Get PDF
    We measure phonon dispersion and linewidth in a single crystal of MgB_2 along the Gamma-A, Gamma-M and A-L directions using inelastic X-Ray scattering. We use Density Functional Theory to compute the effect of both electron-phonon coupling and anharmonicity on the linewidth, obtaining excellent agreement with experiment. Anomalous broadening of the E_2g phonon mode is found all along Gamma-A. The dominant contribution to the linewidth is always the electron-phonon coupling.Comment: 4 pages, 3 figure

    Privacy-preserving Social Distance Monitoring on Microcontrollers with Low-Resolution Infrared Sensors and CNNs

    Get PDF
    Low-resolution infrared (IR) array sensors offer a low-cost, low-power, and privacy-preserving alternative to optical cameras and smartphones/wearables for social distance monitoring in indoor spaces, permitting the recognition of basic shapes, without revealing the personal details of individuals. In this work, we demonstrate that an accurate detection of social distance violations can be achieved processing the raw output of a 8x8 IR array sensor with a small-sized Convolutional Neural Network (CNN). Furthermore, the CNN can be executed directly on a Microcontroller (MCU)-based sensor node. With results on a newly collected open dataset, we show that our best CNN achieves 86.3% balanced accuracy, significantly outperforming the 61% achieved by a state-of-the-art deterministic algorithm. Changing the architectural parameters of the CNN, we obtain a rich Pareto set of models, spanning 70.5-86.3% accuracy and 0.18-75k parameters. Deployed on a STM32L476RG MCU, these models have a latency of 0.73-5.33ms, with an energy consumption per inference of 9.38-68.57µJ

    Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search

    No full text
    Sequence-to-sequence deep neural networks have become the state of the art for a variety of machine learning applications, ranging from neural machine translation (NMT) to speech recognition. Many mobile and Internet of Things (IoT) applications would benefit from the ability of performing sequence-to-sequence inference directly in embedded devices, thereby reducing the amount of raw data transmitted to the cloud, and obtaining benefits in terms of response latency, energy consumption and security. However, due to the high computational complexity of these models, specific optimization techniques are needed to achieve acceptable performance and energy consumption on single-core embedded processors. In this paper, we present a new optimization technique called dynamic beam search, in which the inference complexity is tuned to the difficulty of the processed input sequence at runtime. Results based on measurements on a real embedded device, and on three state-of-the-art deep learning models, show that our method is able to reduce the inference time and energy by up to 25% without loss of accuracy

    Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes

    No full text
    With the increasing popularity of Internet of Things (IoT) devices, there is a growing need for energy-efficient Machine Learning (ML) models that can run on constrained edge nodes. Decision tree ensembles, such as Random Forests (RFs) and Gradient Boosting (GBTs), are particularly suited for this task, given their relatively low complexity compared to other alternatives. However, their inference time and energy costs are still significant for edge hardware. Given that said costs grow linearly with the ensemble size, this paper proposes the use of dynamic ensembles, that adjust the number of executed trees based both on a latency/energy target and on the complexity of the processed input, to trade-off computational cost and accuracy. We focus on deploying these algorithms on multi-core low-power IoT devices, designing a tool that automatically converts a Python ensemble into optimized C code, and exploring several optimizations that account for the available parallelism and memory hierarchy. We extensively benchmark both static and dynamic RFs and GBTs on three state-of-the-art IoT-relevant datasets, using an 8-core ultra-lowpower System-on-Chip (SoC), GAP8, as the target platform. Thanks to the proposed early-stopping mechanisms, we achieve an energy reduction of up to 37.9% with respect to static GBTs (8.82 uJ vs 14.20 uJ per inference) and 41.7% with respect to static RFs (2.86 uJ vs 4.90 uJ per inference), without losing accuracy compared to the static model

    Advanced surface characterization of Ba(Fe_(0.92)Co_(0.08))_2As_2 epitaxial thin films

    No full text
    We report on the systematic characterization of Ba(Fe_(0.92)Co_(0.08))_2As_2 epitaxial thin films on CaF2 substrate in view of their possible use for superconducting electronic applications. By using different and complementary techniques we studied the morphological characteristics of the surface, the structural properties, the magnetic response, and the superconducting properties in terms of critical temperature, critical current, and energy gaps. Particular attention was paid to the homogeneity of the films and to the comparison of their superconducting properties with those of single crystals of the same compound
    corecore